Skip to content

Conversation

@baohe-zhang
Copy link

@baohe-zhang baohe-zhang commented Apr 29, 2020

What changes were proposed in this pull request?

Add a new class HybridStore to make the history server faster when loading event files. When rebuilding the application state from event logs, HybridStore will write data to InMemoryStore at first and use a background thread to dump data to LevelDB once the writing to InMemoryStore is completed. HybridStore is to make content serving faster by using more memory. It's only safe to enable it when the cluster is not having a heavy load.

Why are the changes needed?

HybridStore can greatly reduce the event logs loading time, especially for large log files. In general, it has 4x - 6x UI loading speed improvement for large log files. The detailed result is shown in comments.

Does this PR introduce any user-facing change?

This PR adds new configs spark.history.store.hybridStore.enabled and spark.history.store.hybridStore.maxMemoryUsage.

How was this patch tested?

A test suite for HybridStore is added. I also manually tested it on 3.1.0 on mac os.

This is a follow-up for the work done by Hieu Huynh in 2019.

@jiangxb1987
Copy link
Contributor

So you mean for loading 1Gb of eventlogs, the HybridKVStore takes 63s while the current LevelDB takes 69s ?

@baohe-zhang
Copy link
Author

baohe-zhang commented Apr 30, 2020

@jiangxb1987 from my testing result, loading 1g file, HybridKVStore takes 23s to parse (that means, users only need to wait for 23s to see the UI), LevelDB takes 69s. the 40s is the total time for parsing a file and transferring the data to leveldb. Sorry for the confusion.

@jiangxb1987
Copy link
Contributor

@baohe-zhang Good to know the result, it sounds great! cc @rednaxelafx

@gatorsmile
Copy link
Member

ok to test

@SparkQA
Copy link

SparkQA commented Apr 30, 2020

Test build #122105 has finished for PR 28412 at commit 0164960.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • public class HybridKVStore implements KVStore

@SparkQA
Copy link

SparkQA commented Apr 30, 2020

Test build #122119 has finished for PR 28412 at commit 7e3c39e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

HyukjinKwon commented May 1, 2020

cc @vanzin FYI

Copy link
Contributor

@HeartSaVioR HeartSaVioR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution. The concept looks great.

I've skimmed the implementation, and got the feeling that this is a bit complicated because of dual writes on the foreground & background. The complication is even exposed to the FsHistoryProvider.

Given the HybridKVStore is only used for loading event log and no further exposed for modification (AFAIK - please correct me if I'm missing here), I'm seeing a chance to simplify the logic - dump in-memory KVStore back to LevelDB (in background) once the loading is done to the in-memory KVStore. (KVStore should be read-only and reject the writes.) We won't need to deal with dual writes on concurrent threads, only need to switch the KVStore correctly.

This wouldn't bring latency on serving content - needs more seconds to write to LevelDB, in other words, needs more seconds to keep up the memory usage. (Not sure how long it would be.)

What do you think?

// compaction may touch the file(s) which app rebuild wants to read
// compaction wouldn't run in short interval, so try again...
logWarning(s"Exception occurred while rebuilding app $appId - trying again...")
lease.rollback()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lease.rollback() was enough to throw out intermediate LevelDB KVStore on temporary directory which fails to load at any reason. It doesn't look like the case of Hybrid KVStore.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't quite understand this comment, could you elaborate more? I think in the current implementation, if any exceptions are thrown when migrating data to leveldb, the hybrid kvstore will not switch to leveldb and the getStore() method in hybrid kvstore will always return an in-memory kvstore.

Copy link
Contributor

@HeartSaVioR HeartSaVioR May 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed the instance will be dereferenced, my bad.

It might be still ideal to clean up the instance explicitly, as the instance may keep the huge memory usages. Please note that this can happen in the middle of processing.

(The previous logic may not be complete to clean up the instance as well, so just 2 cents.)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hybrid kvstore is now cleaned up explicitly.


// TODO: Maybe need to do other check to see if there's enough memory to
// use inMemoryStore.
if (hybridKVStoreEnabled) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've feeling that too much HybridKVStore implementation details are exposed here which could be abstracted away if we have proper interface for disk based loading-only KVStore for SHS.

E.g. suppose new interface on top of KVStore receives a Lease on initialize(), and exposes commit() & rollback() to handle the implementation details on each condition.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. I will address it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found it's difficult to pass Lease to hybrid kvstore, since Lease class is only visible within the History package.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code related to creating a hybrid kvstore is now refactored as a function.

logWarning(s"Failed to switch to use LevelDb for app" +
s" $appId / ${attempt.info.attemptId}")
levelDB.close()
throw e
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Here we should rollback the lease as well.
  2. SHS would assume KVStore is loaded properly when the caller method returns. If I understand correctly, throwing exception here doesn't propagate to the caller method, which means HybridKVStore should still serve the content in any way, in-memory store for this case. (Caution of memory usage)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will address 1. For 2, if switching to leveldb failed, hybrid kvstore will always use in-memory kvstore as it's underlying kvstore.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 is addressed.

.createWithDefault(true)

val HYBRID_KVSTORE_ENABLED = ConfigBuilder("spark.history.store.hybridKVStore.enabled")
.version("3.0.1")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. This needs to have doc to describe the functionality, as well as proper caution on memory usage.
  2. "3.1.0" is correct as of now.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will address it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed.

@tgravescs
Copy link
Contributor

concept seems good, a couple of high level questions without me having looked at the detailed code.
I assume the number of threads to read is still spark.history.fs.numReplayThreads?
It looks like you only have one thread to write to levelDB. Is this enough for it to not be backed up? Lets say we have a very active history server and user loading a lot of large files, does the memory balloon before it can be flushed to levelDB?

@HeartSaVioR
Copy link
Contributor

Also worth to mention that memory usage should be under control; there's no restriction for now.

@SparkQA
Copy link

SparkQA commented May 6, 2020

Test build #122332 has finished for PR 28412 at commit e6707bd.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HeartSaVioR
Copy link
Contributor

Let's discuss first with the plan how to address the major concerns in comments, especially how to restrict the overall memory usage. I think that's a blocker for the production use.

@baohe-zhang
Copy link
Author

@HeartSaVioR One way in my mind is that we can monitor the memory usage of SHS. If the memory usage or event log size exceeds a threshold (e.g, over 50% of Xmx), we can use leveldb to parse event log, instead of hybrid kvstore.

@HeartSaVioR
Copy link
Contributor

I'm not sure that's fairly simple to do. Concurrent load of applications can be happening in SHS, right? The default value of spark.history.retainedApplications is 50, which means maximum 50 apps can be loaded into cache at the same time.

@baohe-zhang
Copy link
Author

Another way is to keep a thread-safe variable called availableMemory in FsHistoryProvider. The initial value can be set as a percentage of Xmx. When we parse a file via hybrid kvstore, we subtract an approximate memory usage from availableMemory, and when the hybrid store switches to leveldb, we add back this approximate memory usage. When availableMemory is below a threshold, we can disable hybridKVstore.

@SparkQA
Copy link

SparkQA commented May 6, 2020

Test build #122333 has finished for PR 28412 at commit 34e4564.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 6, 2020

Test build #122335 has finished for PR 28412 at commit 141feed.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@baohe-zhang
Copy link
Author

@tgravescs What I saw in FsHistoryProvider is that "spark.history.fs.numReplayThreads" is used to create a thread pool to mergeApplicationListing and compact. These threads seem are not responsible for parsing a complete single event. But I am not sure if my understanding is correct. And In hybrid kvstore, each instance has one writing thread to write data to leveldb.

@HeartSaVioR
Copy link
Contributor

The idea is similar with HistoryServerDiskManager so makes sense in general. We may need to get concrete answers for these questions to go forward:

  1. How we will guarantee these area of memory is used only for Hybrid KV store to prevent OOM? (Or no guard and end users have to deal with providing enough memory on heap?)

  2. How to calculate approximate memory usage? I guess it would be safe to assume the approximate size as event log file size, but it would take over huge memory for single app. (That may not be a problem on safety perspective, but pretty less efficient.)

@SparkQA
Copy link

SparkQA commented May 12, 2020

Test build #122553 has finished for PR 28412 at commit 3162a8a.

  • This patch fails build dependency tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@baohe-zhang
Copy link
Author

baohe-zhang commented May 12, 2020

@HeartSaVioR
There are some updates in the previous commit.

  1. If hybrid kvstore is enabled, FsHistoryProvider will try it first, and if it encountered any exceptions it will close hybrid kvstore and using leveldb.
  2. A memory usage tracker is added. The idea is similar to HistoryServerDiskManager. And if there is not enough available memory, leveldb will be used instead of hybrid kvstore.
  3. The memory usage approximation is the same as the one in HistoryServerDiskManager. I monitored the memory usage for uncompressed log files. And I found the peak memory usage during rebuilding is usually 1/2 to 1/4 of log filesize. So I think estimate the memory usage as half of the log filesize is safe.

@SparkQA
Copy link

SparkQA commented May 12, 2020

Test build #122555 has finished for PR 28412 at commit a796231.

  • This patch fails build dependency tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@baohe-zhang
Copy link
Author

@HeartSaVioR @tgravescs
These are the measurement of memory usage on zstd compressed log files:

  • 200 jobs, 400 tasks for each job: 8.6 MB file size, 50.5 MB memory usage.
  • 100 jobs, 400 tasks for each job: 4.5 MB file size, 31.4 MB memory usage.
  • 50 jobs, 400 tasks for each job: 2.4 MB file size, 15. 7 MB memory usage.
  • 10 jobs: 400 tasks for each job: 816 KB file size, 4.7 MB memory usage.

@HeartSaVioR
Copy link
Contributor

Thanks for the update. There's a case it goes up closer to 10x but not really 10x, which seems that 10x is safe one to apply.

val it = inMemoryStore.view(klass).closeableIterator()
while (it.hasNext()) {
levelDB.write(it.next())
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drive by comment - given I added something similar to an in-house patch.
Add a write(Iterator<E> values) to kv store - this should make this switch order(s) faster.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the information, I will try adding a write(Iterator<E> values) in this pr.

Copy link
Contributor

@mridulm mridulm Jul 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this helps, this is what I had written up for level db - for memory store, the default list traversal + write is good enough :

  @Override
  public void write(List<?> values) throws Exception {
    Preconditions.checkArgument(values != null && !values.isEmpty(),
      "Non-empty values required.");

    // Group by class, in case there are values from different classes in the iterator
    // Typical usecase is for this to be a single class.
    for (Map.Entry<? extends Class<?>, ? extends List<?>> entry :
            values.stream().collect(Collectors.groupingBy(Object::getClass)).entrySet()) {

      final Iterator<?> valueIter = entry.getValue().iterator();
      final Iterator<byte[]> serializedValueIter;

      {
        // deserialize outside synchronized block
        List<byte[]> list = new ArrayList<>(entry.getValue().size());
        for (Object value : entry.getValue()) {
          list.add(serializer.serialize(value));
        }
        serializedValueIter = list.iterator();
      }

      final Class<?> valueClass = entry.getKey();
      final LevelDBTypeInfo ti = getTypeInfo(valueClass);

      // Batching updates per type
      synchronized (ti) {
        final LevelDBTypeInfo.Index naturalIndex = ti.naturalIndex();
        final Collection<LevelDBTypeInfo.Index> indices = ti.indices();

        try (WriteBatch batch = db().createWriteBatch()) {
          while (valueIter.hasNext()) {
            final Object value = valueIter.next();

            assert serializedValueIter.hasNext();
            final byte[] serializedObject = serializedValueIter.next();

            Object existing;
            try {
              existing = get(naturalIndex.entityKey(null, value), valueClass);
            } catch (NoSuchElementException e) {
              existing = null;
            }

            PrefixCache cache = new PrefixCache(value);
            byte[] naturalKey = naturalIndex.toKey(naturalIndex.getValue(value));
            for (LevelDBTypeInfo.Index idx : indices) {
              byte[] prefix = cache.getPrefix(idx);
              idx.add(batch, value, existing, serializedObject, naturalKey, prefix);
            }
          }
          assert !serializedValueIter.hasNext();
          db().write(batch);
        }
      }
    }
  }

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this is helpful!

Copy link
Author

@baohe-zhang baohe-zhang Jul 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @mridulm, I updated your code and used it on in-memory store - leveldb switching, but only saw little switching time improvement. I am not sure if somewhere wrong.

log size, jobs and tasks per job 2 jobs, 400 tasks per job 10 jobs, 400 tasks per job 50 jobs, 400 tasks per job 100 jobs, 400 tasks per job 200 jobs, 400 tasks per job 500 jobs, 400 tasks per job 1000 jobs, 400 tasks per job 5 jobs, 100000 tasks per job
original switching time 1s 2s 4s 8s 16s 37s 65s 90s
switching time with write(Iterator iter) 1s 1s 4s 7s 13s 34s 58s 84s

The code:

        for (klass <- klassMap.keys().asScala) {
          val it = inMemoryStore.view(klass).closeableIterator()
          levelDB.write(it)
        }
  public <T> void write(Iterator<T> iter) throws Exception {
    Preconditions.checkArgument(iter != null, "Non-empty values required.");

    List<T> values = new ArrayList<>();
    iter.forEachRemaining(values::add);

    // Group by class, in case there are values from different classes in the iterator
    // Typical usecase is for this to be a single class.
    for (Map.Entry<? extends Class<?>, ? extends List<?>> entry :
            values.stream().collect(Collectors.groupingBy(Object::getClass)).entrySet()) {

      final Iterator<?> valueIter = entry.getValue().iterator();
      final Iterator<byte[]> serializedValueIter;

      {
        // deserialize outside synchronized block
        List<byte[]> list = new ArrayList<>(entry.getValue().size());
        for (Object value : entry.getValue()) {
          list.add(serializer.serialize(value));
        }
        serializedValueIter = list.iterator();
      }

      final Class<?> valueClass = entry.getKey();
      final LevelDBTypeInfo ti = getTypeInfo(valueClass);

      // Batching updates per type
      synchronized (ti) {
        final LevelDBTypeInfo.Index naturalIndex = ti.naturalIndex();
        final Collection<LevelDBTypeInfo.Index> indices = ti.indices();

        try (WriteBatch batch = db().createWriteBatch()) {
          while (valueIter.hasNext()) {
            final Object value = valueIter.next();

            assert serializedValueIter.hasNext();
            final byte[] serializedObject = serializedValueIter.next();

            Object existing;
            try {
              existing = get(naturalIndex.entityKey(null, value), valueClass);
            } catch (NoSuchElementException e) {
              existing = null;
            }

            PrefixCache cache = new PrefixCache(value);
            byte[] naturalKey = naturalIndex.toKey(naturalIndex.getValue(value));
            for (LevelDBTypeInfo.Index idx : indices) {
              byte[] prefix = cache.getPrefix(idx);
              idx.add(batch, value, existing, serializedObject, naturalKey, prefix);
            }
          }
          assert !serializedValueIter.hasNext();
          db().write(batch);
        }
      }
    }
  }

I think using multiple threads to write data to leveldb might shorten the switching time but it would introduce more overhead to SHS.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a function of how loaded your disk is, iops it can sustain, txn's leveldb can do concurrently.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense. I am testing it on the mac which has SSD and the disk is not busy. I think the improvement might be more obvious on HDD or busy disk. @HeartSaVioR @tgravescs Do we need to add batch write support for leveldb on this pr?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's mandatory. You can file another issue as "improvement" for this, but IMHO working with this is completely optional for you. I think we have already asked so many things to do.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. We can improve that afterward.

@tgravescs
Copy link
Contributor

yeah 10x definitely seems safe as most of the number are more at the 8x number for zstd. I'm fine with leaving the current logic for the small files, we can always follow up with more enhancements to skip them later if we see that its causing a lot of load. @HeartSaVioR I'm not sure if that is what you were agreeing with or your suggest was to change it here to skip the small ones? I think you were ok either way but wanted to clarify.

@HeartSaVioR
Copy link
Contributor

HeartSaVioR commented Jul 9, 2020

Yeah I’m OK either way. I think both ways wouldn’t bring major issues in reality.

Copy link
Contributor

@HeartSaVioR HeartSaVioR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for as it is. I'll wait for @tgravescs to finalize review.

@HeartSaVioR
Copy link
Contributor

retest this, please

@mridulm
Copy link
Contributor

mridulm commented Jul 10, 2020 via email

@SparkQA
Copy link

SparkQA commented Jul 10, 2020

Test build #125522 has finished for PR 28412 at commit b71a923.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HeartSaVioR
Copy link
Contributor

retest this, please

@SparkQA
Copy link

SparkQA commented Jul 10, 2020

Test build #125598 has finished for PR 28412 at commit b71a923.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@tgravescs
Copy link
Contributor

test this please

@SparkQA
Copy link

SparkQA commented Jul 10, 2020

Test build #125616 has finished for PR 28412 at commit b71a923.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@tgravescs
Copy link
Contributor

test this please

@SparkQA
Copy link

SparkQA commented Jul 10, 2020

Test build #125635 has finished for PR 28412 at commit b71a923.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HeartSaVioR
Copy link
Contributor

@tgravescs Do you plan another round of review, or OK as it is?

@tgravescs
Copy link
Contributor

looks good

@HeartSaVioR
Copy link
Contributor

retest this, please

@SparkQA
Copy link

SparkQA commented Jul 14, 2020

Test build #125841 has finished for PR 28412 at commit b71a923.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HeartSaVioR
Copy link
Contributor

Thanks! Merged into master.

@baohe-zhang
Copy link
Author

Thanks a lot for your reviews!

HeartSaVioR pushed a commit that referenced this pull request Jul 22, 2020
… of HybridStore

### What changes were proposed in this pull request?
The idea is to improve the performance of HybridStore by adding batch write support to LevelDB. #28412  introduces HybridStore. HybridStore will write data to InMemoryStore at first and use a background thread to dump data to LevelDB once the writing to InMemoryStore is completed. In the comments section of #28412 , mridulm mentioned using batch writing can improve the performance of this dumping process and he wrote the code of writeAll().

### Why are the changes needed?
I did the comparison of the HybridStore switching time between one-by-one write and batch write on an HDD disk. When the disk is free, the batch-write has around 25% improvement, and when the disk is 100% busy, the batch-write has 7x - 10x improvement.

when the disk is at 0% utilization:
| log size, jobs and tasks per job   | original switching time, with write() | switching time with writeAll() |
| ---------------------------------- | ------------------------------------- | ------------------------------ |
| 133m, 400 jobs, 100 tasks per job  | 16s                                   | 13s                            |
| 265m, 400 jobs, 200 tasks per job  | 30s                                   | 23s                            |
| 1.3g, 1000 jobs, 400 tasks per job | 136s                                  | 108s                           |

when the disk is at 100% utilization:
| log size, jobs and tasks per job  | original switching time, with write() | switching time with writeAll() |
| --------------------------------- | ------------------------------------- | ------------------------------ |
| 133m, 400 jobs, 100 tasks per job | 116s                                  | 17s                            |
| 265m, 400 jobs, 200 tasks per job | 251s                                  | 26s                            |

I also ran some write related benchmarking tests on LevelDBBenchmark.java and measured the total time of writing 1024 objects. The tests were conducted when the disk is at 0% utilization.

| Benchmark test           | with write(), ms | with writeAll(), ms |
| ------------------------ | ---------------- | ------------------- |
| randomUpdatesIndexed     | 213.06           | 157.356             |
| randomUpdatesNoIndex     | 57.869           | 35.439              |
| randomWritesIndexed      | 298.854          | 229.274             |
| randomWritesNoIndex      | 66.764           | 38.361              |
| sequentialUpdatesIndexed | 87.019           | 56.219              |
| sequentialUpdatesNoIndex | 61.851           | 41.942              |
| sequentialWritesIndexed  | 94.044           | 56.534              |
| sequentialWritesNoIndex  | 118.345          | 66.483              |

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Manually tested.

Closes #29149 from baohe-zhang/SPARK-32350.

Authored-by: Baohe Zhang <[email protected]>
Signed-off-by: Jungtaek Lim (HeartSaVioR) <[email protected]>
Copy link
Member

@gatorsmile gatorsmile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#28412 does not have any unit test case for verifying the new behaviors and configurations.

This is required for all the feature PRs even if the features are added for improving the performance

@baohe-zhang
Copy link
Author

@gatorsmile Thanks for reminding me that! I will add unit tests for these HybridStore related PRs.

HeartSaVioR pushed a commit that referenced this pull request Aug 25, 2020
…HistoryServerMemoryManager

### What changes were proposed in this pull request?
This pull request adds 2 test suites for 2 new classes HybridStore and HistoryServerMemoryManager, which were created in #28412. This pull request also did some minor changes in these 2 classes to expose some variables for testing. Besides 2 suites, this pull request adds a unit test in FsHistoryProviderSuite to test parsing logs with HybridStore.

### Why are the changes needed?
Unit tests are needed for new features.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Unit tests.

Closes #29509 from baohe-zhang/SPARK-31608-UT.

Authored-by: Baohe Zhang <[email protected]>
Signed-off-by: Jungtaek Lim (HeartSaVioR) <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants